Context and Page Analysis for Improved Web Search

نویسندگان

Steve Lawrence

C. Lee Giles

چکیده

Several popular and useful search engines—such as AltaVista, Excite, HotBot, Infoseek, Lycos, and Northern Light—attempt to maintain full-text indexes of the World Wide Web. However, relying on a single standard search engine has limitations. The standard search engines have limited coverage,1,2 outdated databases, and are sometimes unavailable due to problems with the network or the engine itself. The precision of standard engine results can also vary because they generally focus on handling queries quickly and use relatively simple ranking schemes.3 Rankings can be further muddled by keyword “spamming” to increase a page’s rank order. Often, the relevance of a particular page is obvious only after loading it and finding the query terms. Metasearch engines, such as MetaCrawler and SavvySearch, attempt to contend with the problem of limited coverage by submitting queries to several standard search engines at once.4,5 The primary advantages of metasearch engines are that they combine the results of several search engines and present a consistent user interface.5 However, most metasearch engines rely on the documents and summaries returned by standard search engines and so inherit their limited precision and vulnerability to keyword spamming. We developed the NEC Research Institute (NECI) metasearch engine to improve the efficiency and precision of Web search by downloading and analyzing each document and then displaying results that show the query terms in context. This helps users more readily determine if the document is relevant without having to download each page. This technique is simple, yet it can be very effective, particularly when dealing with the Web’s large, diverse, and poorly organized database. Results from the NECI engine are returned progressively after each page is downloaded and analyzed, rather than after all pages are downloaded. Pages are downloaded in parallel and

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image flip CAPTCHA

The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...

متن کامل

Analysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type

Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Geospatial Web Image Mining

One commonly asked question when confronted with a photograph is “Where is this place?” When talking about a place mentioned on the Web, the question arises “What does this place look like?” Today, these questions can not reliably be answered for Web images as they typically do not explicitly reveal their relationship to an actual geographic position. Analysis of the keywords surrounding the im...

متن کامل

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Internet Computing

دوره 2 شماره

صفحات -

تاریخ انتشار 1998

Context and Page Analysis for Improved Web Search

نویسندگان

چکیده

منابع مشابه

Image flip CAPTCHA

Analysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type

A New Hybrid Method for Web Pages Ranking in Search Engines

Geospatial Web Image Mining

Expert Discovery: A web mining approach

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

عنوان ژورنال:

اشتراک گذاری